Promote MAAP staging hubs to prod #7219

bsatoriu · 2025-12-02T05:19:43Z

Synchronize the latest maap staging.values.yaml updates to prod.values.yaml.

github-actions · 2025-12-02T05:21:05Z

Merging this PR will trigger the following deployment actions.

Support deployments

Cloud Provider	Cluster Name	Reason for Redeploy
aws	victor	Support helm chart has been modified
aws	openscapeshub	Support helm chart has been modified
aws	maap	Support helm chart has been modified
gcp	2i2c-uk	Support helm chart has been modified
aws	earthscope	Support helm chart has been modified
aws	projectpythia	Support helm chart has been modified
aws	jupyter-health	Support helm chart has been modified
gcp	catalystproject-latam	Support helm chart has been modified
kubeconfig	projectpythia-binder	Support helm chart has been modified
gcp	2i2c	Support helm chart has been modified
aws	disasters	Support helm chart has been modified
gcp	dubois	Support helm chart has been modified
gcp	leap	Support helm chart has been modified
aws	oceanhackweek	Support helm chart has been modified
aws	aimatx-2i2c-hub	Support helm chart has been modified
aws	reflective	Support helm chart has been modified
aws	opensci	Support helm chart has been modified
gcp	awi-ciroh	Support helm chart has been modified
aws	2i2c-aws-us	Support helm chart has been modified
aws	nasa-ghg-hub	Support helm chart has been modified
kubeconfig	2i2c-jetstream2	Support helm chart has been modified
aws	nmfs-openscapes	Support helm chart has been modified
gcp	cloudbank	Support helm chart has been modified
kubeconfig	utoronto	Support helm chart has been modified
aws	catalystproject-africa	Support helm chart has been modified
aws	nasa-veda	Support helm chart has been modified
gcp	hhmi	Support helm chart has been modified
aws	ubc-eoas	Support helm chart has been modified
aws	temple	Support helm chart has been modified
aws	smithsonian	Support helm chart has been modified
aws	bnext-bio	Support helm chart has been modified
aws	berkeley-geojupyter	Support helm chart has been modified
aws	nasa-cryo	Support helm chart has been modified
aws	strudel	Support helm chart has been modified

Staging deployments

Cloud Provider	Cluster Name	Hub Name	Reason for Redeploy
aws	maap	staging	Following helm chart values files were modified: staging.values.yaml, common.values.yaml
aws	earthscope	staging	Following helm chart values files were modified: common.values.yaml
gcp	2i2c-uk	staging	Following prod hubs require redeploy: lis

Production deployments

Cloud Provider	Cluster Name	Hub Name	Reason for Redeploy
aws	maap	prod	Following helm chart values files were modified: prod.values.yaml, common.values.yaml
gcp	2i2c-uk	lis	Following helm chart values files were modified: lis.values.yaml
aws	earthscope	prod	Following helm chart values files were modified: common.values.yaml
aws	earthscope	binder	Following helm chart values files were modified: common.values.yaml

grallewellyn · 2025-12-02T17:25:27Z

@bsatoriu the images should be pointing to OPS not DIT for prod and we want to add QGIS back in
If you give me write access to your fork I can make the updates, saves me from having to make my own PR today

bsatoriu · 2025-12-02T22:57:04Z

@bsatoriu the images should be pointing to OPS not DIT for prod and we want to add QGIS back in If you give me write access to your fork I can make the updates, saves me from having to make my own PR today

I added you @grallewellyn

… bucket names, added qgis image back

grallewellyn · 2025-12-02T23:07:39Z

Thanks, Brian! I made the necessary updates

yuvipanda

To prevent drift between staging and prod, we want to keep config in common.yaml as much as possible. So the workflow can be:

Test thing in staging via staging specific config
When they are ready, move them to common.yaml so that's the behavior of both staging and prod

This way we minimize the config that's prod specific, and can use staging to validate issues.

Can you move most of the config to common.yaml than prod.yaml and i'll merge?

bsatoriu · 2025-12-03T22:14:12Z

To prevent drift between staging and prod, we want to keep config in common.yaml as much as possible. So the workflow can be:

Test thing in staging via staging specific config

When they are ready, move them to common.yaml so that's the behavior of both staging and prod

This way we minimize the config that's prod specific, and can use staging to validate issues.

Can you move most of the config to common.yaml than prod.yaml and i'll merge?

This is complete.

yuvipanda · 2025-12-03T22:38:00Z

@bsatoriu i see the changes in prod.yaml still

yuvipanda · 2025-12-03T23:07:47Z

Cross posting from Slack, after @bsatoriu nudged me to point out that the changes to prod are actually required.

The primary difference between staging and prod after your latest changes is:

env vars set to differentiate between staging and prod
images used themselves (prod has tags, staging has 'develop')

In our experience, (2) is often used to 'test' new images before they get rolled out. This is often better achieved by having people type in the image tag into the 'unlisted image' option in prod, and keep staging and prod have the exact same images instead. This lets people test new images in prod without affecting others, and makes sure staging and prod are as close a match as possible, rather than using staging as almost a 'development' instance. In the future for example, if you're experimenting with s3fuse on staging, making sure the images are the same with prod cuts down on a lot of potential issues when ramping up.

So my suggestion is:

The env vars should be put in a place that's common to all profiles - singleuser.extraEnv. MAAP_API_HOST seems same for both staging and prod, so can go in common.yaml. WORKSPACE_BUCKET seems different so it can go in the respective staging or prod yaml files (since singleuser.extraEnv is a dict it'll be merged). DOCKERIMAGE_PATH_DEFAULT and DOCKERIMAGE_PATH_BASE_IMAGE seem to be the name of the images used, which is also available as $(JUPYTER_IMAGE) (that's kubernetes syntax - see https://kubernetes.io/docs/tasks/inject-data-application/define-interdependent-environment-variables/) - so you can set those in common.yaml as well. This also allows people to experiment with images using unlisted choice without needing changes here.
Keep all the images as tags, and put them in common.yaml. For testing new images, use unlisted choice.

for more information, see https://pre-commit.ci

yuvipanda · 2025-12-04T00:29:23Z

config/clusters/maap/prod.values.yaml

    extraEnv:
      SCRATCH_BUCKET: s3://maap-scratch-prod/$(JUPYTERHUB_USER)
+      MAAP_API_HOST: api.maap-project.org
+      DOCKERIMAGE_PATH_DEFAULT: mas.maap-project.org/root/maap-workspaces/custom_images/maap_base:v5.0.0


This will set this as the environment variable no matter what image is used. Is that what was expected? In the last PR, I saw this was set to be the same as the name of the image, in which case it should use $(JUPYTER_IMAGE) as the value.

Okay, extracted out DOCKERIMAGE_PATH_BASE_IMAGE!

config/clusters/maap/common.values.yaml

yuvipanda · 2025-12-04T00:40:37Z

config/clusters/maap/prod.values.yaml

+      WORKSPACE_BUCKET: maap-ops-workspace
    nodeSelector:
      2i2c/hub-name: prod
+    profileList:


Normally, we would like to keep profileLists in common.yaml, and use the same image in staging and prod. The staging here is primarily for testing infrastructure changes, and we (2i2c) would like to generally keep it the exact same as prod. So that if we have tested something on staging, we're 99% confident it would work in prod.

having different images in staging and prod could cause problems here, in case the images being different causes failure when migrating. It could also cause the other parts of profile Lists (such as resource config) to drift out of sync between these two.

However, we also recognize that you want to probably test out different images as you're onboarding an existing userbase to this hub, and want to be flexible.

So I see two paths forward:

Use the same image tags for staging and prod, and put it in common.yaml. Image testing happens purely via unlisted choice. This is the preferred way, and also where we should go long term.

If (1) doesn't fit with your existing workflows for building images, leave a block comment above the profileList config in staging and prod, documenting that it's duplicated, and that whoever is modifying it should take care to make sure that the only differences between these two should be the image tags, and everything else should be kept in sync manually. We can then revisit this in 3-6 months, after the initial migration is completed and the pace of image changes has changed.

I wanna unblock y'all asap, so while I have a preference for (1) happy to do either.

Sorry jumping into this conversation as I come back from leave.

Let me know if I am phrasing this correctly -
You are saying that staging and prod are meant for infrastructure testing and everything else remains the same. In that case, we (MAAP) as tenants of this infrastructure should be deploying 3 versions of your prod configuration for our own customers and venues (DIT, UAT and OPS). The tenant should not need to worry about your changes in your staging environment.
We should be able to deploy multiple 2i2c prod environments with different MAAP configurations for our testing.

Does that make sense?

On MAAP, the DIT, UAT and OPS venues come with their associated deployments of the API and data processing clusters which impact the jupyter extensions used in the images. So in terms of testing, we are not just testing the images, but also entire the deployment venue which is isolated in its own cloud env.

I added a block comment above profileList and we would like to go with option 2

Thanks @grallewellyn! I've retitled the PR slightly and merged this!

@sujen1412 I opened #7233 to split off the other conversation so we don't lose track of it!

github-actions · 2025-12-04T01:16:39Z

🎉🎉🎉🎉

Monitor the deployment of the hubs here 👉 https://github.com/2i2c-org/infrastructure/actions/runs/19914343488

yuvipanda · 2025-12-04T01:30:47Z

The deployment is failing because the image mas.dit.maap-project.org/root/maap-workspaces/2i2c/pangeo:develop doesn't exist. Since it fails in staging, prod deployment will not occur.

This is another reason to keep prod and staging images the same - since staging is supposed to catch issues that affect prod. Here staging has caught an issue, and we no longer know if it'll affect prod or not (and vice versa - staging may succeed but prod may fail).

So in the long run, each production environment should have its own staging where the images are the same.

grallewellyn · 2025-12-04T01:48:48Z

Since the only difference between staging and production is the image tags, the only way the pipeline could fail is if the image doesn't exist. If the image doesn't exist, we can quickly push it and rerun the deployment
Once we get the hang of the 2i2c deployment process, this won't be an issue again

If there is something wrong with the images, then it won't launch in 2i2c but that is a different issue

Update maap prod.values.yaml

b835c9d

Point s3fs container to dedicated ops images

79fe98f

changed dit tags to ops, staging using develop tag not version, fixed…

30dc772

… bucket names, added qgis image back

yuvipanda requested changes Dec 3, 2025

View reviewed changes

Merge common maap setting values

7ec814d

grallewellyn and others added 3 commits December 3, 2025 15:28

Merge branch 'main' of github.com:bsatoriu/infrastructure

0c8bcde

extracted env vars to singleuser instead of for each image

e0cf438

[pre-commit.ci] auto fixes from pre-commit.com hooks

afb39bc

for more information, see https://pre-commit.ci

yuvipanda reviewed Dec 4, 2025

View reviewed changes

config/clusters/maap/common.values.yaml Outdated Show resolved Hide resolved

bsatoriu and others added 4 commits December 3, 2025 16:35

Remove irrevelant postgres comment

4523925

Merge branch 'main' of github.com:bsatoriu/infrastructure

0f0018c

extracted DOCKERIMAGE_PATH_BASE_IMAGE out

902e24b

Merge branch 'main' of github.com:bsatoriu/infrastructure

0d6898c

yuvipanda reviewed Dec 4, 2025

View reviewed changes

grallewellyn added 2 commits December 3, 2025 16:53

added block comment as requested

5dae0ce

updated policy to always pull for staging

5d41a48

yuvipanda changed the title ~~Update maap prod.values.yaml~~ Promote MAAP staging hubs to prod Dec 4, 2025

yuvipanda self-requested a review December 4, 2025 01:16

yuvipanda approved these changes Dec 4, 2025

View reviewed changes

yuvipanda merged commit 09a9bd7 into 2i2c-org:main Dec 4, 2025
44 checks passed

Promote MAAP staging hubs to prod #7219

Promote MAAP staging hubs to prod #7219

Uh oh!

Conversation

bsatoriu commented Dec 2, 2025

Uh oh!

github-actions bot commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Support deployments

Staging deployments

Production deployments

Uh oh!

grallewellyn commented Dec 2, 2025

Uh oh!

bsatoriu commented Dec 2, 2025

Uh oh!

grallewellyn commented Dec 2, 2025

Uh oh!

yuvipanda left a comment

Choose a reason for hiding this comment

Uh oh!

bsatoriu commented Dec 3, 2025

Uh oh!

yuvipanda commented Dec 3, 2025

Uh oh!

yuvipanda commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuvipanda Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

grallewellyn Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yuvipanda Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

sujen1412 Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

sujen1412 Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

grallewellyn Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

yuvipanda Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Dec 4, 2025

Uh oh!

yuvipanda commented Dec 4, 2025

Uh oh!

grallewellyn commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Dec 2, 2025 •

edited

Loading

yuvipanda commented Dec 3, 2025 •

edited

Loading